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AMENDMENTS TO THE CLAIMS : 



Docket No.: 200402117-02 US (1509-477) 



This listing of claims will replace all prior versions and listings of claims in the 
application: 

Listing of Claims : 

1. (currently amended) A method of clustering documents or patterns each having 
one or plural document or pattern segments in an input document or pattern set, said method 
comprising: 

(a) obtaining a co-occurrence matrix for each input document, and obtaining an input 
document or pattern frequency matrix for the set of input documents or patterns, b ased on 
occurrence frequencies of terms or term pairs appearing in the set of input documents e aeh 
document or pattern ; 

(b) selecting a seed document or pattern fro m a set of remaining documents ef 
patterns that are not included in any cluster existing at that moment, and constructing a current 
cluster of an initial state based on using the seed document or pattern , wherein said selectin g and 
constructing comprise: comprises 



(b- 1 ) constructing a remaining document common co-occurrence matrix 
for the set of the remaining documents based on a product of corresponding 
components of the co-occurrence matrices of all documents in the set of 
remaining documents; or pattoms; and 

(b-2) obtaining a document commonality of each remaining document to 
the set of the remaining documents based on a product sum between every 
component of the co-occurrence matrix of each remaining document and the 
corresponding component of the remaining document common co-occurrence 
matrix; 

(b-3) (b 2) using the common co occurrence matrix to extract ing , as 

the seed document or pattern , the document or pattern h aving the highest 
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document or pattern commonality to the set of the remaining documents ef 

(b-4) constructing the initial cluster by including the seed document and 
neighbor documents similar to the seed document; 
(c) obtaining the document or pattern commonality to the current cluster for each 
document or pattern in the input document or pattern set by using information based on the 
document or pattern frequency matrix for the input document or pattern set, information based 
on the document or pattem frequency matrix for documents or patterns in the current cluster and 
information based on a conmion co occurrence matrix of the current cluster, and making 
documents , which have or pattems having the document conmionalit y to the current cluster 
higher than a threshold^ belong temporarily to the current cluster; wherein said making 
comprising: 

(c- 1 ) constmcting a current cluster common co-occurrence matrix for the 
current cluster and a current cluster document frequency matrix of the current 
cluster based on occurrence frequencies of terms or term pairs appearing in the 
documents of the current cluster; 

(c-2) obtaining a distinctiveness value of each term and each term pair 
for the current cluster by comparing the input document frequency matrix with the 
current cluster document frequency matrix; 

(c-3) obtaining weights of each term and each term pair from their 
distinctiveness values; 

(c-4) obtaining a document commonality to the current cluster for each 
document in the input document set based on a product sum between every 
component of the co-occurrence matrix of the input document and the 
coiTesponding component of the current cluster common co-occurrence matrix 
while applying the respective weights to said components: and 

(c-5) making documents having the document commonality to the 
current cluster higher than the threshold belong temporarily to the current cluster: 
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(d) repeating step (c) until the number of documents or patterns temporarily 
belonging to the current cluste r becomes the same as that in the previous repetition does not 
increase ; 

(e) repeating steps (b) through (d) until a given convergence condition is satisfied; 

and 

(f) deciding, on the basis of the document or pattern commonality of each document 
or pattern to each cluster, a cluster to which each document or pattern belongs and outputting 
said cluster. 

2. (currently amended) A clustering method according to claim 1, wherein said step 
(a) further includes: 

(a- 1 ) generating a n input document or pattern segment vector for each of sai d input 
document or pattern segments based on occurrence frequencies of terms appearing in each input 
document or pattern segment; 

(a-2) obtaining a co-occun ence matrix for each input documen t from or pattern in the 
input document or pattern set from the document or pattern segment vectors; and 

(a-3) obtaining an input document or pattern frequency matrix from the co-occurrence 
matrix for each document. 

3-4. (canceled) 

5. (currently amended) A clustering method according to claim 1, wherein further 
including: 

the convergence condition in said r epeating step (e) is satisfied when (i) until the number 

of documents or patterns whose document or pattern commonalities to any current clusters are 
less than a threshold becomes 0, o r (ii) the number is less than a threshold and is equal to that of 
the previous repetition does not increase . 



6. (currently amended) A clustering method according to claim 1, wherein said 
step (f) further includes: 
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checking existence of a redundant cluster, and removing, when the redundant cluster 
exists, the redundant cluster and again deciding the cluster to which each document belongs. 

7. (currently amended) A method of clustering documents or patterns each having 
one or plural document or pattern segments in an input document or pattern set, said method 
comprising: 

(a) obtainin g a co-occuixence matrix S*^ for each input document a document or 
pattern frequency matrix for the set of input documents or patterns, based on occurrence frequencies 
of terms or term pairs appearing i n each document or pattem the set of input documents ; 

(b) selecting a seed document or pattem fro m a set of remaining documents ef 
patterns that are not included in any cluster existing at that moment^ and constructing a current 
cluster of an initial state «si«g-basedon_the seed documen t, wherein said selecting and 
constructing comprise: 

(b- 1) constructing a remaining document common co-occurrence matrix 
for the set of the remaining documents based on the co-occurrence matrices of 
all documents in the set of remaining documents; 

(b-2) obtaining a document conmionality of each remaining document to 
the set of the remaining documents based on the co-occurrence matrix S*^ of each 
remaining document and the remaining document common co-occurrence matrix 

(b-3) extracting, as the seed document, the document having the highest 
document commonality to the set of the remaining documents; and 

(b-4) constructing the initial cluster by including the seed document and 
neighbor documents similar to the seed document; or pattem; 

(c) obtaining the document or pattem commonality to the current cluster for each 
document or pattem in the input document or pattern set by using information based on the 
document or pattern frequency matrix for the input document or pattern set, information based 
on the document or pattem frequency matrix for documents or patterns in the current cluster and 
information based on a common co occurrence matrix of the current cluster, and making 
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documents or patterns having the document commonality higher than a threshold belong 
temporarily to the current cluster; 

(d) repeating step (c) until the number of documents or patterns temporarily 
belonging to the current cluster becomes the same as that in the previous repetition does not 
increase ; 

(e) repeating steps (b) through (d) until a given convergence condition is satisfied; 

and 

(f) deciding, on the basis of the document or pattern commonality of each document 
or pattern to each cluster, a cluster to which each document or pattern belongs and outputting 
said cluster; 

wherein a 

wherein in step (a), each mn component S'^nin of the co-occurrence matrix S"^ of the 
document or pattern Dr is determined in accordance with: 

S'^mn = l^^=\drym'^ryn 

where: 

m and n denote m'^ and n^ terms, respectively, among M is the number of sorts of the 
occurring terms appearing in the set of input documents , 

Dr is the r"^ document or pattern in a document or pattern set D consisting of R 
documents or patterns , 

Yr is the number of document or pattern segments in document or pattern Dr, and 

dfy-s-fdryt^^rrrdf^) ^ is the y^ document or pattern segment vector of document or pattern 
D ^, and T represents transposition of a vector wherein dr vr n and dr yn denote the existence or 
absence of the m^ and n^ terms, respectively, in the y^ document segment of document Dr, and 
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S'^rnrn ^represents the number of document segments in which the term occurs and S'^r nn 
represents the co-occurrence counts of document segments in which the and n^ terms co- 
occur . 

8. (canceled) 

9. (currently amended) A method according to claim 7, further comprising: 
wherein in step (b-1), deteimining the remaining document common co-occurrence 

matrix T'^ of the document or pattern iset D is determined on the basis of a matrix T; 
wherein 

an mn compon e nt of is given by 

S^mn =Yly'=idrymdryn 

the matrix T has an mn component detemiined by 

Tmn= li^^iS'mn .and 

the matrix T"^ has an mn component determined by 

T\n = Tmn, Uni„ > A, 

T^mn = 0 otherwise, 
where 

Umn represents an mn the mn componen t of a ef-fee-document or pattern frequency 
matrix of the set of remaining documents^ or pattern set D wherein Umm denotes the number of 
remaining documents in which the m^ term occurs and Ujm, denotes the number of remaining 
documents in which the m^ and n"^ terms co-occur : and 

A denotes a predetermined threshold. 

10. (currently amended) A method according to claim 9, further comprising: 
determining a modified common co-occurrence matrix on the basis of T"^ : and 
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in step (b-2), obtaining the document commonality of each remaining document to the set 
of the remaining documents based on the co-occurrence matrix of each remaining document 
and the modified common co-occurrence matrix 0^; 

the matrix Q"^ having an mn component determined by 

Q^mn = l0gT^mn T^mn > 1, 

Q^^ = 0 otherwise. 

11. (currently amended) A method according to claim 10, wherein in step (b-2). 
^H ffl and are respectively weights for a term or object feature m and a term or object 
feature pair m, n, and 

Q - the document or pattern commonality o f each remaining document or pattern P having 
a co-occurrence matrix with respect to the set of remaining documents € 
given by 



com(D\P-Q')= ^-'^-^ 



12. (currently amended) A method according to etoa- ^ claim 10 . wherein in step 

(b-2). 

affl ffl and are respectively weights for a term or object feature m and a term or object 
feature pair m, n, and 

a - the document or pattern commonality o f each remaining document or pattern P having 
a co-occurrence matrix with respect to the set of remaining documents or pattern set D is 
given by 
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com (D',P;T'') 



13-22. (canceled) 

23. (Original) A computer arranged to perform the method of claim 1 . 

24. (Original) A computer arranged to perform the method of claim 2. 
25-26. (canceled) 

27. (Original) A computer arranged to perform the method of claim 5. 

28. (Original) A computer arranged to perform the method of claim 6. 



29. (currently amended) A clustering apparatus for clustering documents € 
each having one or plural document or pattern segments in an input document or pattern set, the 
apparatus comprising: 

a first unit for obtaining a co-occurrence matrix for each input document, and obtaining an 
input document or pattem frequency matrix for the set of input documents or patterns , based on 
occurrence frequencies of terms or term pairs appearing in the set of input documents or pattem ; 

a second unit for selecting a seed document or pattem from a set of remaining documents 
s-that are not included in any cluster existing at that moment and constructing a current 
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cluster of an initial state based on- asiag the seed documen t, said second unit being configured for 

or pattern, wherein said selecting comprises 

constructing a remaining document common co-occurrence matrix for the 
set of the remaining documents based on a product of corresponding components 
of the co-occurrence matrices of all documents in the set of remaining documents: 
or patterns; and 

obtaining a document commonality of each remaining document to the set 
of the remaining documents based on a product sum between every component of 
the co-occurrence matrix of each remaining document and the corresponding 
component of the remaining document common co-occurrence matrix; 

using the common co occurrence matrix to extrac ting , as the seed 
document or pattern , the document or pattern h aving the highest document ef 
pattern commonality to the set of the remaining documents or patterns ; and 

constructing the initial cluster by including the seed document and 
neighbor documents similar to the seed document; 
a third unit 

for obtaining the document or pattern commonality to the current cluster 
for each document or pattern in the input document or pattern set using 
information based on the document or pattern frequency matrix for the input 

docum e nt or patt e rn s e t. information bas e d on th e docum e nt or pattern frequency 
matrix for documents or patterns in the current cluster and information based on a 
common co occurrence matrix of the current cluster, and 

for making documents , which have or patterns having the document ef 
pattern commonality to the current cluster h igher than a threshold^ belong 
temporarily to the current cluster: wherein said third unit is configured for: 

constructing a current cluster common co-occurrence matrix for the 
current cluster and a current cluster document frequency matrix of the current 
cluster based on occurrence frequencies of terms or term pairs appearing in the 
documents of the current cluster; 
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obtaining a distinctiveness value of each term and each term pair for the 
current cluster by comparing the input document frequency matrix with the 
current cluster document frequency matrix; 

obtaining weights of each term and each term parr from their 
distinctiveness values; 

obtaining a document commonality to the current cluster for each 
document in the input document set based on a product sum between every 
component of the co-occurrence matrix of the input document and the 
corresponding component of the current cluster common co-occurrence matrix 
while applying the respective weights to said components; and 

making documents having the document commonality to the current 
cluster higher than the threshold belong temporarily to the current cluster; 
a fourth unit for repeating the operations of the third unit until the number of documents 
or patterns temporarily belonging to the cun'ent cluste r becomes the same as that in the previous 
repetition does not increase : 

a fifth unit for repeating the operations of the second through fourth units until given 
convergence conditions are satisfied; and 

a sixth unit for deciding, on the basis of the document or pattern commonality of each 
document or pattern to each cluster, a cluster to which each document or pattern b elongs, and for 
outputting said cluster. 

30. (currently amended) A clustering apparatus according to claim 29, wherein the 
remaining document common co-occurrence matrix or the current cluster common co-occurrence 
matrix reflects co-occurrence frequencies at which pairs of different terms co-occur in each 
document or pattern of the remaining documents or patterns the current cluster, respectively . 

31. (currently amended) A method according to claim 1, wherein the remaining 
document common co-occurrence matrix or the current cluster common co-occurrence matrix 
reflects co-occurrence frequencies at which pairs of different terms co-occur in each document ef 
pattern of the remaining documents or patterns the current cluster, respectively . 
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